skip to main content


Search for: All records

Creators/Authors contains: "He, Weifeng"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available June 23, 2024
  2. Energy-efficient bitcoin mining cores have gained significant attention since the energy cost for computing dominates the mining expenses [1]. Ultra-low-voltage (ULV) digital circuits have emerged as an attractive approach to improve the energy-efficiency. However, they demand a large timing margin for the worst-case process, voltage, and temperature (PVT) variations, undermining a significant portion of energy savings. Recent works, including multi-phase latch pipeline [1], tunable replica circuits [2]–[3], in-situ error detection and correction (EDAC) [4]–[6], and dynamic timing enhancement [7], can reduce the pessimistic margin. However, it is not straightforward to adopt those techniques in mining cores due to their deeply-pipelined architecture (up to 128 stages [1]). For example, to adopt EDAC, the deep pipeline requires inserting many bulky error detectors as it has many critical paths. Our experiment with a 0.3V 28-nm mining core shows >18.9% registers need to be replaced with error detectors, considering 6σ local process variation only. Also, multiple stages can have timing errors simultaneously, making an error correction process (e.g., clock gating [5], VDD boosting [6]) complex and costly. 
    more » « less
  3. Emerging applications like a drone and an autonomous vehicle require system-on-a-chips (SoCs) with high reliability, e.g., the mean-time-between-failure (MTBF) needs to be over tens of thousands of hours [1]. Meanwhile, as these applications require increasingly higher performance and energy efficiency, a multi-core architecture is often desirable. Here, each core operates in an independent voltage/frequency (V/F) domain, ideally from the near-threshold voltage (NTV) to super-threshold, while communicating with one another via a network-on-chip (NoC) [2]. However, this makes it challenging to ensure robustness in clock domain crossing against metastability. Metastability becomes even more critical to NTV circuits since metastability resolution time constant T grows super-linearly with voltage scaling [3]. Conventionally, an NoC uses multi-stage (4 stages in [4]) synchronizers to improve MTBF, but they increase latency and cannot completely eliminate metastability. Recently, [5] proposed a novel NTV flip-flop, which has a lower probability of having metastability. Another recent work [6] proposed to detect the necessary condition of metastability and mitigate it by modulating the RX clock and also requesting retransmission to guarantee data correctness. However, as it detects a necessary condition, not actual metastability, it tends to overly request retransmission, hurting latency, throughput, and energy efficiency. 
    more » « less
  4. As the capacity of DRAM continues to grow, the refresh operation rapidly becomes the performance and power-efficiency bottleneck. Also, restore time, the time given for recharging cells post access, makes an increasingly large amount of negative impact on performance. To tackle these problems, in this paper, we propose an in-situ charge detection and adaptive data restoration DRAM (CDAR-DRAM) architecture, which can dynamically adjust the refresh rate and also relax the constraints on restore time. The proposed CDAR-DRAM employs a low-cost skewed-inverter-based detector, which can reduce the excessive timing margins that prior work added to guarantee the functionality of leaky DRAM cells under the worst-case temperature condition. Moreover, an adaptive DRAM refresh and restore scheme is proposed, which can switch automatically between two modes: (i) a refresh mode that supports adaptive refresh rate, and (ii) a restore mode that relaxes the constraints on restore time dynamically for cells having sufficient charge. With the transistor-and architecture-level simulations, we evaluate the CDAR-DRAM in an 8-core system across different workloads. Compared with the prior art, the proposed architecture achieves a 9.4% improvement in system performance and a 14.3% reduction in energy consumption, without requiring the time-consuming profiling process which many prior works employed. 
    more » « less
  5. null (Ed.)
  6. Emerging embedded systems, such as autonomous robots/vehicles, demand a new system-on-a-chip (SoC) that is ultra-low power (mW or even sub-mW level) but highly robust. Such an SoC typically integrates heterogeneous building blocks for supporting a range of features, each ideally operating in an independent voltage and frequency (V/F) domain [1]. In such an architecture, a network-on-chip (NoC) has played a key role to enable high-speed and energy-efficient networking. However, it is increasingly challenging to meet a robustness target since each V/F domain uses a significantly different voltage, e.g., from nominal 1V to near-threshold voltage (NTV), and clock frequency, e.g., from hundreds of MHz to sub-MHz. Furthermore, any two clocks may have uncertain and time-varying phase and frequency relationships. These properties significantly worsen robustness, particularly metastability, in an NoC. 
    more » « less